Goto

Collaborating Authors

 autoregressive score matching


Autoregressive Score Matching

Neural Information Processing Systems

Autoregressive models use chain rule to define a joint probability distribution as a product of conditionals. These conditionals need to be normalized, imposing constraints on the functional families that can be used. To increase flexibility, we propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariate log-conditionals (scores), which need not be normalized. To train AR-CSM, we introduce a new divergence between distributions named Composite Score Matching (CSM). For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. Compared to previous score matching algorithms, our method is more scalable to high dimensional data and more stable to optimize. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.


Review for NeurIPS paper: Autoregressive Score Matching

Neural Information Processing Systems

Weaknesses: EXPERIMENTS * Section 5.1, Figure 2 - Given that the vertical axes are on completely different scales, it's unclear what this comparison shows. Even though the loss curve for CSM plateaus more quickly than DSM, that doesn't necessarily imply that the trained model achieves better density estimation performance. The paper claims "less shifted colors" when using CSM, but there doesn't seem to be a noticeable difference difference between MLE and CSM for CelebA. So without any comparison to other methods nor quantitative metric (such as PSNR), the denoising results only seems to serve as a quick sanity check. A more thorough experiment would be necessary to demonstrate the the models trained under CSM are "sufficiently expressive to capture complex distributions and solve difficult tasks." * Section 6 - The NLL and FID improvements are very small.


Autoregressive Score Matching

Neural Information Processing Systems

Autoregressive models use chain rule to define a joint probability distribution as a product of conditionals. These conditionals need to be normalized, imposing constraints on the functional families that can be used. To increase flexibility, we propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariate log-conditionals (scores), which need not be normalized. To train AR-CSM, we introduce a new divergence between distributions named Composite Score Matching (CSM). For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.